UrduAI: Writeprints for Urdu Authorship Identification

نویسندگان

چکیده

The authorship identification task aims at identifying the original author of an anonymous text sample from a set candidate authors. It has several application domains such as digital forensics and information retrieval. These are not limited to specific language. However, most studies focused on English attention been paid Urdu. existing Urdu solutions drop accuracy number training samples per reduces when authors increases. Consequently, these inapplicable real-world cases. Moreover, due unavailability reliable POS taggers or sentence segmenters, all word n-grams features only. To overcome limitations, we formulate stylometric feature space, which is Based this use solution that transforms each into point set, retrieves samples, relies nearest neighbors classifier predict sample. evaluate our solution, create significantly larger corpus than conduct experimental show can limitations report level 94.03%, higher previous works.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visualizing Authorship for Identification

As a result of growing misuse of online anonymity, researchers have begun to create visualization tools to facilitate greater user accountability in online communities. In this study we created an authorship visualization called Writeprints that can help identify individuals based on their writing style. The visualization creates unique writing style patterns that can be automatically identifie...

متن کامل

N-Gram Based Authorship Attribution in Urdu Poetry

Authorship attribution is an interesting problem in Computational Linguistics. Traditional author recognition systems for electronic text rely on techniques which train the system to the specific vocabulary and writing style of the writer and apply stochastic methods to judge a given text at byte, letter or word levels. In this paper we have developed a software system to apply one existing and...

متن کامل

Detecting authorship deception: a supervised machine learning approach using author writeprints

We describe a new supervised machine learning approach for detecting authorship deception, a specific type of authorship attribution task particularly relevant for cybercrime forensic investigations, and demonstrate its validity on two case studies drawn from realistic online data sets. The core of our approach involves identifying uncharacteristic behavior for an author, based on a writeprint ...

متن کامل

Authorship Identification for Heterogeneous Documents

The study of authorship identification in Japanese has for the most part been restricted to literary texts using basic statistical methods. In the present study, authors of mailing list messages are identified using a machine learning technique (Support Vector Machines). In addition, the classifier trained on the mailing list data is applied to identify the author of Web documents in order to i...

متن کامل

Unsupervised Method for the Authorship Identification Task

This paper presents an approach for tackling the authorship identification task. The approach is based on comparing the similarity between a given unknown document against the known documents using a number of different phrase-level and lexical-syntactic features, so that an unknown document can be classified as having been written by the same author, if the different similarity measures obtain...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2021

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3476467